class: center, middle, inverse, title-slide # Lecture 13 ## Models for Factorial Designs ### Psych 10 C ### University of California, Irvine ### 04/27/2022 --- ## Models for factorial designs - The previous class we talked about two different models for factorial designs. -- - The **Null** model which formalizes the assumption that the combinations of our factors (groups) have no effect on the expectation of our dependent variable (observations). -- - The Null model is expressed formally as: `$$y_{ijk}\sim\text{Normal}(\mu,\sigma_0^2)$$` - Where `\(i\)` represents the observation number of the combination of the *j-th* level of factor 1 and the *k-th* level of factor 2. -- - The second type of model we covered was the **Main effects** model. Main effects models assume that the expected value of our dependent variable is different between levels of a single factor regardless of the values of other factors on the experiment. -- - The number of Main effects models that we will have depends on the number of independent factors that we have. --- ## Models for factorial designs - As we saw last class, a Main effects model for factor `\(j\)` was expressed as: `$$y_{ijk}\sim\text{Normal}(\mu+\alpha_j,\sigma_1^2)$$` -- - While the main effects model of factor `\(k\)` is: `$$y_{ijk}\sim\text{Normal}(\mu+\beta_k,\sigma_2^2)$$` -- - We will only work with `\(2\times 2\)` factorial designs for now, so these are the only two main effects models that we need. -- - Remember that we use a different effects variable `\((\alpha_j\)` for factor `\(j\)` and `\(\beta_k\)` for factor `\(k\)`) because we will use those variables for another model. -- - Today, we will introduce the remaining two models that we use in a factorial design. --- ## The additive model - An additive model will formalize the assumption that two (or more depending on the number of independent variables) factors have an effect on the expected value of our dependent variable. Furthermore, this type of model assumes that those effects are independent and therefore can be added together in order to make a prediction. -- - We express an additive model formally as: `$$y_{ijk}\sim\text{Normal}(\mu+\alpha_j+\beta_k,\sigma_3^2)$$` -- - Where `\(\mu\)` is the grand mean, `\(\alpha\)` represents the main effect of factor 1 and `\(\beta\)` represents the main effect of factor 2. In this case, the factors are added in order to make a prediction. -- - This means that the model prediction for a participant that responds to a combination of the *j-th* level of factor one and the *k-th* level of factor 2 will be: `$$\mu_{jk} = \mu + \alpha_j + \beta_k$$` --- ## Example: anxiety by cohort and stats class - Problem: we want to study the effect of the cohort that a student belongs to and whether they took a statistics class during their first year on their anxiety levels. -- - We have a `\(2\times2\)` between subjects factorial design where the first factor is cohort (2019 `\(j=1\)` vs 2020 `\(j=2\)`) and the second factor is taking a statistics class `\((k=1)\)` vs taking any other class `\((k=2)\)`. -- - Then, the predicted anxiety level of any student in the 2019 cohort that took a statistics class would be:$$\hat{\mu} + \hat{\alpha}_1 + \hat{\beta}_1$$ -- - Where `\(\hat{\mu}\)` represents the estimator of the grand mean and `\(\hat{\alpha}_1\)` and `\(\hat{\beta}_1\)` represent the main effect of cohort and statistics class respectively. -- - The predicted anxiety level of any student in the 2020 cohort that took a statistics class would be: `$$\hat{\mu} + \hat{\alpha}_2 + \hat{\beta}_1$$` --- ## Visualizing the predictions of additive models - As we did with main effects models, we can also make a visual representation of the predictions of an additive model: -- .pull-left[ ```r plot(x = 0, y = 0, axes = FALSE, ann = FALSE, type = "n", xlim = c(0,1), ylim = c(0,1)) box(bty = "l") segments(x0 = c(0.1,0.1), y0 = c(0.1,0.6), x1 = c(0.9,0.9), y1 = c(0.4,0.9), col = c("#c80064","#54bebe"), lwd = 3) axis(side = 1, at = c(0.1,0.9), labels = c("2019", "2020"), cex.axis = 1.7) segments(x0 = 0.12, y0 = 0.5, x1 = 0.88, y1 = 0.5, col = "#555555", lwd = 2, lty = 2) mtext(text = "Anxiety level", side = 2, cex = 2, line = 0.5) legend("topleft", legend = c("No stats","Stats", "grand mean"), col = c("#c80064","#54bebe","#555555"), lwd = 2, cex = 1.4, bty = "n") ``` ] .pull-right[ <img src="data:image/png;base64,#lec-13_files/figure-html/add-pred-graph-out-1.png" style="display: block; margin: auto;" /> ] --- ## Additive model <img src="data:image/png;base64,#lec-13_files/figure-html/add-2-pred-graph-1.png" style="display: block; margin: auto;" /> --- ## Additive model - Once we have obtained the values of the parameters of each of the main effects models, we already have everything that we need for the additive model. -- - In other words, for the additive model we just need to combine the estimators of each main effects model in order to derive a prediction. -- - This is because each factor is assumed to affect the response variable independently of the other. -- - In other words, the predictions of the additive model are just the sum of the main effects. -- - The last model in a `\(2\times2\)` between subjects factorial design is known as **Full** model and it assumes that the levels of each factor can **interact**. --- ## Full model - The **Full** model in factorial designs assumes that the expected value of our dependent variable is different for each combination of the levels of our factors. -- - Furthermore, it assumes that those expectations are independent from one another and that they can't be obtained by simply adding together the main effects of each factor. -- - This is typically known as an **interaction**. The idea is that one of the independent variables in our experiment can *modify* the effect of the other. -- - In our anxiety example, an interaction would be when taking a statistics class has no effect on the anxiety levels of students in the 2019 cohort, but, for students in the 2020 cohort, taking a statistics class increases their anxiety level. -- - In other words, the effect of taking a statistics class is different for students in the 2019 cohort and students from 2020. -- - Maybe because they had a different instructor? --- ## Visualization of an interaction. - There are multiple ways in which we could represent a model with an interaction, however, the key idea is that, instead of having parallel lines as in main effects or additive models, this times the lines will not be parallel. .pull-left[ ```r plot(x = 0, y = 0, axes = FALSE, ann = FALSE, type = "n", xlim = c(0,1), ylim = c(0,1)) box(bty = "l") segments(x0 = c(0.1,0.1), y0 = c(0.4,0.4), x1 = c(0.9,0.9), y1 = c(0.4,0.9), col = c("#c80064","#54bebe"), lwd = 3) axis(side = 1, at = c(0.1,0.9), labels = c("2019", "2020"), cex.axis = 1.7) mtext(text = "Anxiety level", side = 2, cex = 2, line = 0.5) legend("topleft", legend = c("No stats","Stats"), col = c("#c80064","#54bebe"), lwd = 2, cex = 1.4, bty = "n") ``` ] .pull-right[ <img src="data:image/png;base64,#lec-13_files/figure-html/full-pred-graph-out-1.png" style="display: block; margin: auto;" /> ] --- ## Example: Interaction - Another example of an interaction is: the anxiety level of students in the 2019 cohort is high for students that took a statistics class and is low for students that did not. However, the anxiety levels of students from the 2020 cohort is low for those who took a statistics class and it's high for students that did not. -- .pull-left[ ```r plot(x = 0, y = 0, axes = FALSE, ann = FALSE, type = "n", xlim = c(0,1), ylim = c(0,1)) box(bty = "l") segments(x0 = c(0.1,0.1), y0 = c(0.1,0.9), x1 = c(0.9,0.9), y1 = c(0.9,0.1), col = c("#c80064","#54bebe"), lwd = 3) axis(side = 1, at = c(0.1,0.9), labels = c("2019", "2020"), cex.axis = 1.7) mtext(text = "Anxiety level", side = 2, cex = 2, line = 0.5) legend("top", legend = c("No stats","Stats"), col = c("#c80064","#54bebe"), lwd = 2, cex = 1.4, bty = "n") ``` ] .pull-right[ <img src="data:image/png;base64,#lec-13_files/figure-html/full2-pred-graph-out-1.png" style="display: block; margin: auto;" /> ] --- ## Full model - The key idea in a **Full** model is that it predicts that the effect of a factor in the expected value of our dependent variable changes depending on the values that the other factor takes. -- - In our first example, the effect of taking a statistics class on the anxiety levels of students was different for students in the 2019 cohort in comparison to students in the 2020 cohort. -- - In other words, taking a statistics class had no effect on anxiety levels for students in the 2019 cohort, but it had the effect of increasing anxiety levels for students in the 2020 cohort. -- - In the second example, for students in the 2019 cohort having taken a statistics class increased their anxiety levels. On the other hand, students from the 2020 cohort had a lower anxiety level if they had taken a statistics class in comparison to students that did not. -- - In other words, the effect of taking a statistics class on the anxiety levels of students was different for students in the 2019 cohort in comparison to students in the 2020 cohort. --- ## Full model - The **Full** model formalizes the assumption that the expectation of the dependent variable depends on the combinations of factor levels, and it is not the sum of the independent effects. -- - Formally, we can express the full model as: `$$y_{ijk}\sim\text{Normal}(\mu_{jk},\sigma_4^2)$$` -- - Notice that in the Full model, each combination of the levels of our factors `\(j\)` and `\(k\)` has a different expectation `\(\mu_{jk}\)`. However, this expectation can no longer be express as the addition of a grand mean and main effects. -- - This model is similar to the effects models that we have talked about before, as it assumes that the predictions of each group (in a between subjects factorial design) are different. --- ## Estimators for the full model - The **Full** model is simple to estimate but hard to interpret. -- - The estimator of `\(\mu_{jk}\)` is equal to the average of the group that was exposed to the *j-th* and *k-th* levels of our first and second factor respectively. In other words, it is equal to: `$$\hat{\mu}_{jk} = \frac{1}{n_{jk}} \sum_i y_{ijk}$$` -- - Where `\(n_{jk}\)` represents the number of participants that where exposed to the *j-th* level of factor `\(j\)` and the *k-th* level of factor `\(k\)`. -- - In our anxiety example, `\(\hat{\mu}_{11}\)` would be the average anxiety level of students in the 2019 cohort that took a statistics class during their first year. -- - While `\(\hat{\mu}_{21}\)` would be the average anxiety level of students in the 2020 cohort that took a statistics class during their first year. --- ## Number of models to compare on each design - When we have a `\(2\times2\)` between subjects factorial design (like in our anxiety example), we need to evaluate the following models: -- - One **Null** model that assumes all groups have the same expected value of an dependent variable. -- - Two **Main effects** models, each assumes that one and **only one** of the factors has an effect on the expected value of our dependent variable. -- - One **Additive** model that assumes that the expected value of our dependent variable is equal to the sum of the independent effects of the levels of each factor and the grand mean. -- - One **Full** model that assumes that the expected value of our independent variable is different for each combination of the levels of our factors and that it can't be express as the sum of independent effects. -- - This means that in a `\(2\times2\)` between subjects factorial design we will have to calculate the predictions and errors of 5 different models. --- ## Number of models in designs with 3 factors - The number of models that we need to compare increases rapidly with the number of factors (independent variables) in the experiment, for example, if we have a `\(2\times 2 \times 2\)` between subjects factorial design, we will have: -- - One **Null** model, three **Main effects** models (one for each factor) -- - Four **Additive** models, three of them will use only two factors at a time, factor 1 with factor 2, or factor 1 with factor 3, or factor 2 with factor 3. The last one will be the additive model of the 3 factors at the same time. -- - Finally, there will be a single **Full** model. -- - Because the **Full** model in designs with 3 factors can be so difficult to interpret we usually don't take it into account. -- - Again, when we increase the number of factors on an experiment, it is better to have hypotheses which can inform what models it would be relevant to look at, instead of testing every model. --- ## Factorial designs - The equations that we have studied the last two classes will work for every between subjects factorial design. The estimators we have talked about can always be obtained in order to derive a model's predictions and their errors. -- - However, there is a "short" way to obtain all of the values that we need. -- - This method is known as **Cell means**. -- - The cell means method will allow us to get the **grand mean** `\(\mu\)` the main effects `\(\alpha_j\)` and `\(\beta_j\)` and the means of the groups for the full model `\(\mu_{jk}\)`. --- ## Cell means method - When we have a `\(2\times 2\)` between subjects factorial design we can obtain the parameters that we need for all our models by using a cell means table. -- - This method will only be applicable if we have the same number of observations (participants) in every group in the experiment, otherwise the estimates will not be adequate. -- - A cell means table is a matrix representation of the combinations of the levels of a factor. In the case of a `\(2\times 2\)` between subjects factorial design, we would have the levels of one factor as the rows and the levels of the second factor as columns. -- - Le's look at an example using our students anxiety problem: --- ## Students anxiety - First we start with the parameters of the full model, which are easy to find given that they are just the average of each combination of the factor levels. - With those values we can obtain the average of the levels of each factor by adding the values on a column (or row) and dividing the result by 2. - Finally, we can obtain the grand mean by taking the average of the middle four cells in the matrix (cohort-statistics combinations). <br> | | Statistics | Other | Mean | |----------|:---------------------:|:---------------------:|:--------------------:| | **2019** | `\(\hat{\mu}_{11}\)` | `\(\hat{\mu}_{12}\)` | `\(\hat{\mu}_{1\cdot}\)` | | **2020** | `\(\hat{\mu}_{21}\)` | `\(\hat{\mu}_{22}\)` | `\(\hat{\mu}_{2\cdot}\)` | | **Mean** | `\(\hat{\mu}_{\cdot 1}\)` | `\(\hat{\mu}_{\cdot 2}\)` | `\(\hat{\mu}\)` |